6 research outputs found

    A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    Get PDF
    Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants

    The distribution of standard deviations applied to high throughput screening

    Get PDF
    High throughput screening (HTS) assesses compound libraries for “activity” using target assays. A subset of HTS data contains a large number of sample measurements replicated a small number of times providing an opportunity to introduce the distribution of standard deviations (DSD). Applying the DSD to some HTS data sets revealed signs of bias in some of the data and discovered a sub-population of compounds exhibiting high variability which may be difficult to screen. In the data examined, 21% of 1189 such compounds were pan-assay interference compounds. This proportion reached 57% for the most closely related compounds within the sub-population. Using the DSD, large HTS data sets can be modelled in many cases as two distributions: a large group of nearly normally distributed “inactive” compounds and a residual distribution of “active” compounds. The latter were not normally distributed, overlapped inactive distributions – on both sides –, and were larger than typically assumed. As such, a large number of compounds are being misclassified as “inactive” or are invisible to current methods which could become the next generation of drugs. Although applied here to HTS, it is applicable to data sets with a large number of samples measured a small number of times

    Rare variants in PPARG with decreased activity in adipocyte differentiation are associated with increased risk of type 2 diabetes

    Full text link

    Uncertainty quantification in skin cancer classification using three-way decision-based Bayesian deep learning

    No full text
    Abstract Accurate automated medical image recognition, including classification and segmentation, is one of the most challenging tasks in medical image analysis. Recently, deep learning methods have achieved remarkable success in medical image classification and segmentation, clearly becoming the state-of-the-art methods. However, most of these methods are unable to provide uncertainty quantification (UQ) for their output, often being overconfident, which can lead to disastrous consequences. Bayesian Deep Learning (BDL) methods can be used to quantify uncertainty of traditional deep learning methods, and thus address this issue. We apply three uncertainty quantification methods to deal with uncertainty during skin cancer image classification. They are as follows: Monte Carlo (MC) dropout, Ensemble MC (EMC) dropout and Deep Ensemble (DE). To further resolve the remaining uncertainty after applying the MC, EMC and DE methods, we describe a novel hybrid dynamic BDL model, taking into account uncertainty, based on the Three-Way Decision (TWD) theory. The proposed dynamic model enables us to use different UQ methods and different deep neural networks in distinct classification phases. So, the elements of each phase can be adjusted according to the dataset under consideration. In this study, two best UQ methods (i.e., DE and EMC) are applied in two classification phases (the first and second phases) to analyze two well-known skin cancer datasets, preventing one from making overconfident decisions when it comes to diagnosing the disease. The accuracy and the F1-score of our final solution are, respectively, 88.95% and 89.00% for the first dataset, and 90.96% and 91.00% for the second dataset. Our results suggest that the proposed TWDBDL model can be used effectively at different stages of medical image analysis

    A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

    No full text
    10.1186/s12864-015-2192-yBMC Genomics161110
    corecore